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Abstract. In this paper, we generalize Spencer's hyperbolic cosine algorithm to the matrix-valued 
setting. We apply the proposed algorithm to several problems by analyzing its computational efficiency 
under two special cases of matrices; one in which the matrices have a group structure and an other in 
which they have rank-one. As an application of the former case, we present a deterministic algorithm 
that, given the multiplication table of a finite group of size n, it constructs an expanding Cayley 
graph of logarithmic degree in near-optimal 0{n^ log^ n) time. For the latter case, we present a fast 
■ deterministic algorithm for spectral sparsification of positive semi-definite matrices, which implies an 

' improved deterministic algorithm for spectral graph sparsification of dense graphs. In addition, we 

give an elementary connection between spectral sparsification of positive semi-definite matrices and 
• element-wise matrix sparsification. As a consequence, we obtain improved element-wise sparsification 

' algorithms for diagonally dominant-like matrices. 
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1 Introduction 



q 

^ I A non-trivial generalization of Chernoff bound type inequalities for matrix-valued random variables was 

introduced by Ahlswede and Winter In parallel, Vershynin and Rudelson introduced similar matrix- 



valued concentration inequalities using different machinery |30I31| . Following these two seminal papers, 
many variants have been proposed in the literature [52]; see [12] for more. Such inequalities, similarly to their 
real-valued ancestors, provide powerful tools to analyze probabilistic constructions and the performance of 
^j>^ ■ randomized algorithms. There is a rapidly growing line of research exploiting the power of these inequalities 

— I including new proofs of probabilistic constructions of expander graphs |3I24I26| . matrix approximation by 

04 ■ element-wise sparsification [14] , graph approximation via edge sparsification |38| , analysis of algorithms for 

I matrix completion and decomposition of low rank matrices |29I25| , semi-definite relaxation and rounding of 

■ quadratic maximization problems }32| . 
In many settings, it is desirable to convert the above probabilistic proofs into efficient deterministic 

procedures. That is, to derandomize the proofs. Wigderson and Xiao presented an efficient derandomization 
of the matrix Chernoff bound by generalizing Raghavan's method of pessimistic estimators to the matrix- 
valued setting |44| . In this paper, we generalize Spencer's hyperbolic cosine algorithm to the matrix- valued 
setting [33| . In an earlier, preliminary version of our paper [45] the generalization of Spencer's hyperbolic 

■ cosine algorithm was also based on the method of pessimistic estimators. However, here we present a proof 
which is based on a simple averaging argument. Next, we carefully analyze two special cases of matrices; one 
in which the matrices have a group structure and the other in which they have rank-one. We apply our main 
result to the following problems: deterministically constructing Alon-Roichman expanding Cayley graphs, 
approximating graphs via edge sparsification and approximating matrices via element-wise sparsification. 

The Alon-Roichman theorem asserts that Cayley graphs obtained by choosing a logarithmic number 
of group elements independently and uniformly at random are expanders [5]. The original proof of Alon 
and Roichman is based on Wigner's trace method, whereas recent proofs rely on matrix-valued deviation 
bounds [24j . Wigderson and Xiao's derandomization of the matrix Chernoff bound implies a deterministic 
0(71"* log n) time algorithm for constructing Alon-Roichman graphs. Independently, Arora and Kale gener- 
alized the multiplicative weights update (MWU) method to the matrix-valued setting and, among other 
interesting implications, they improved the running time to ©(n'^polylog (n)) |22j . Here we further improve 
the running time to 0{n'^ log^ n) by exploiting the group structure of the problem. In addition, our algorithm 
is combinatorial in the sense that it only requires counting the number of all closed (even) paths of size at 



most 0{\ogn) in Cayley graphs. All previous algorithms involve numerical matrix computations such as 
eigenvalue decompositions and matrix exponentiation. 

The second problem that we study is the graph sparsification problem. This problem poses the question 
whether any dense graph can be approximated by a sparse graph under different notions of approxima- 
tion. Given any undirected graph, the most well-studied notions of approximation by a sparse graph include 
approximating, all pairwise distances up to an additive error |28| . every cut to an arbitrarily small multi- 
plicative error [8] and every eigenvalue of the difference of their Laplacian matrices to an arbitrarily small 
relative error |37| : the resulting graphs are usually called graph spanners, cut sparsifiers and spectral spar- 
sifiers, respectively. Given that the notion of spectral sparsification is stronger than cut sparsification, so 
we focus on spectral sparsifiers. An efficient randomized algorithm to construct an (1 + e)-spectral sparsifier 
with ©(nlogn/e^) edges was given in ^5]. Furthermore, an (1 -I- e)-spectral sparsifier with ©(n/e^) edges 
can be computed in 0{mn^ / e^) deterministic time [7]. The latter result is a direct corollary of the spectral 
sparsification of positive semi-definite (psd) matrices problem as defined in [5S] ; see also |17] for more ap- 
plications. Here we present a fast deterministic algorithm for spectral sparsification of psd matrices and, as 
a consequence, we obtain an improved deterministic spectral graph sparsification algorithm for the case of 
dense graphs. 

The last problem that we analyze is the element-wise matrix sparsification problem. This problem was 
first introduced by Achlioptas and McSherry in [T]. They described sampling-based algorithms that select a 
small number of entries from an input matrix A, forming a sparse matrix A, which is close to A in the operator 
norm sense. The motivation to study this problem lies on the need to speed up several matrix computations 
including approximate eigenvector computations yy and semi-definite programming solvers |4I13[ . Recently, 
there are many follow-up results on this problem |5ll4j . To the best of our knowledge, all known algorithms 
for this problem are randomized (see Table 1 of |14|'). In this paper we present the first deterministic algorithm 
and strong sparsification bounds for self-adjoint matrices that have an approximate diagonally dominan10 
property. Diagonally dominant matrices arise in many applications such as the solution of certain elliptic 
differential equations via the finite element method , several optimization problems in computer vision |23| 
and computer graphics |21] , to name a few. 

Organization of the Paper. The paper is organized as follows. In §[51 we present the matrix hyperbolic cosine 
algorithm (Algorithm [1} . We apply the matrix hyperbolic cosine algorithm to derive improved deterministic 
algorithms for the construction of Alon-Roichman expanding Cayley graphs in § [Sj spectral sparsification of 
psd matrices in §|3]and element-wise matrix sparsification in §[51 Due to space constraints, almost all proofs 
have been deferred to the Appendix. 

Our Results 

The main contribution of this paper is a generalization of Spencer's hyperbolic cosine algorithm to the 
matrix- valued setting [33j , |361 Lecture 4] , see Algorithm [H As mentioned in the introduction, our main 
result has connections with a recent derandomization of matrix concentration inequalities |44| . We should 
highlight a few advantages of our result compared to |44| . First, our construction does not rely on composing 
two separate estimators (or potential functions) to achieve operator norm bounds and does not require 
knowledge of the sampling probabilities of the matrix samples as in [44] . In addition, the algorithm of [44] 
requires computations of matrix expectations with matrix exponentials which are computationally expensive, 
see [m Footnote 6, p. 63]. In this paper, we demostrate that overcoming these limitations leads to faster 
and in some cases simpler algorithms. 

Next, we demonstrate the usefulness of the main result by analyzing its computational efficiency under 
two special cases of matrices. Wc begin by presenting the following result 

Theorem 1 (Restatement of Theorem [5]). There is a deterministic algorithm that, given the multipli- 
cation table of a group G of size n, constructs an Alon-Roichman expanding Cayley graph of logarithmic 

^ A self-adjoint matrix A of size n is called diagonally dominant if \Aii\ > \Aij\ for every i G [n]. 
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degree in 0{n^log^ n) time. Moreover, the algorithm performs only group algebra operations that correspond 
to counting closed paths in Cayley graphs. 

To the best of our knowledge, the above theorem improves the running time of all previously known deter- 
ministic constructions of Alon-Roichman Cayley graphs j6l44l22j . Moreover, notice that the running time of 
the above algorithm is optimal up-to poly-logarithmic factors since the size of the multiplication table of a 
finite group of size n is O(n^). 

In addition, we study the computational efficiency of the matrix hyperbolic cosine algorithm on the case 
of matrix samples with rank-one. The motivation for studying this special setting is its connection with 
problems such as graph approximation via edge sparsification as was shown in |7|39j and matrix approxima- 
tion via element-wise sparsification as we will see later in this paper. The main result for this setting can be 
summarized in the following theorem (see §15), which improves the 0{mn^/£^) running time of [3^ when, 
say, TO = f2[n'^) and e is a constant. 

Theorem 2. Suppose < £ < 1 and A = X]"=i ''^i ® '^'"^ given, with column vectors Vi £ R". Then there 
are non-negative real weights {si}i<m, at most [n/e^] of which are non-zero, such that 

{l-efAdiAdi {l+e)^A, 

where A = X^I^i ■^j^i ^i- Moreover, there is a deterministic algorithm which computes the weights Si irj^ 
0(TOn^ log'^ n/e^ -I- n'' logn/e^) time. 

First, as wc have already mentioned the graph sparsification problem can be reduced to spectral sparsification 
of positive semi-definite matrix. Hence as a corollary of the above theorem (proof omitted, see |39j for details) , 
we obtain a fast deterministic algorithm for sparsifying dense graphs, which improves the currently best 
known Oirc' je^') running time for this problem. 

Corollary 1. Given a weighted dense graph H ~ (Vy E) on n vertices with positive weights and < 
£ < 1, there is a deterministic algorithm that returns an {1 -\- e) -spectral sparsifier with 0{n/e'^) edges 
in 0{n^ log n/e"^ maxjlog^ n, 1/e^}) time. 

Second, we give an elementary connection between element-wise matrix sparsification and spectral spar- 
sification of psd matrices. A direct application of this connection implies strong sparsification bounds for 
self-adjoint matrices that are close to being diagonally dominant. More precisely, wc give two element-wise 
sparsification algorithms for sclf-adjoint and diagonally dominant-like matrices; in its randomized and the 
other in its derandomized version (see Table 1 of [M] for comparison). Here, for the sake of presentation, we 
state our results for diagonally dominant matrices, although the results hold under a more general setting 
(see §[5] for details). 

Theorem 3. Let A be any self-adjoint and diagonally dominant matrix of size n and < e < 1. Assume 
for normalization that \\A\\ = 1. 

(a) There is a randomized linear time algorithm that outputs a matrix A G K"^" with at most ©(nlogn/e^) 



non-zero entries such that, with probability at least 1 — 1/n, 



A- A 



< e. 



(b) There is a deterministic 0(e ^nnz (A) log nmaxjlog^ n, 1/e^}) time algorithm that outputs a matrix 



A G R"^" with at most Oinle^) non-zero entries such that 



A- A 



< e. 



Preliminaries. The next discussion reviews several definitions and facts from linear algebra; for more details, 
see [9]. By [n] to be the set {1,2, ... ,n}. We denote by 5"^" the set of symmetric matrices of size n. 
Let X e M", we denote by diag (x) the diagonal matrix containing xi,X2, . . . ,Xn. For a square matrix 
M, we also write diag (M) to denote the diagonal matrix that contains the diagonal entries of M. Let 
A be an TO X n matrix. A'-'^ will denote the j-th column of A and the i-th row of A. We denote 



The 0{-) notation hides log log n and loglog(l/£) factors throughout the paper. 
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Pll = max{||Aa;|| | = 1}, = n-ia.Xi^[^]J2je[n] l^^il ^^^^ by ||A||p = ^Y.i^j A}^ the Frobcnius 

norm of A. Also sr {A) :~ ||^||p / ||^||^ is the stable rank of A and by nnz {A) the number of its non-zero 
entries. The trace of a square matrix B is denoted as tr [B). We write J„ for the all-ones square matrices of 
size n. For two self-adjoint matrices X, F , we say that Y > X \i and only if y — X is a positive semi-definite 
(psd) matrix. Let x G M", then x®x is the nxn matrix such that {x<SS)x)ij = XiXj. Given any matrix A, its 

r A] 

dilation is defined as V {A) = .-p . It is easy to see that Xmax{T^ {A)) = \\A\\, see e.g. [301 Theorem 4.2]. 

j\ u 

Functions of Matrices. Here we review some basic facts about the matrix exponential and the hyperbolic 
cosine function, for more details see [TH]. All proofs of this section have been deferred to the appendix. The 
matrix exponential of a self-adjoint matrix A is defined as exp 

[^] = I + Er=i 4r- Let A = QAQ^ be the 
eigendecomposition of A. It is easy to see that exp [A] = Qexp [A\ . For any real square matrices A and 
B of the same size that commute, i.e., AB = BA, we have that exp [A + B] = exp [A] exp [B]. In general, 
when A and B do not commute, the following estimate is known for self-adjoint matrices. 

Lemma 1. U6'\41^ For any self-adjoint matrices A and B, tr{exp[A + B]) < tr{exp[A] exp[B]). 

We will also need the following fact about matrix exponential for rank one matrices. 

Lemma 2. Let x be a non-zero vector in K" . Then exp [x (E) x] = I„ + 2^^^p-!-.TC>$x. Similarly, exp [—x x] = 

Let us define the matrix hyperbolic cosine function of a self-adjoint matrix A as cosh [A] := (exp [A] + 
exp [—A])/2. Next, we state a few properties of the matrix hyperbolic cosine. 

Lemma 3. Let A be a self-adjoint matrix. Then tr{exp['D (A)]) = 2tr{cosh[A]). 

Lemma 4. Let A be a self-adjoint matrix and P be a projector matrix that commutes with A, i.e., PA = AP. 
Then cosh [PA] = Pcosh [^] + I - P. 

Lemma 5. 143\ Lemma 2.2/ For any positive semi-definite self-adjoint matrix A of size n and any two 
self-adjoint matrices B,C of size n, B < C implies tr[AB) < tr[AC). 

2 Balancing Matrices: a matrix hyperbolic cosine algorithm 

We briefly describe Spencer's balancing vectors game and then generalize it to the matrix- valued setting |36l 
Lecture 4]. Let a two-player perfect information game between Alice and Bob. The game consists of n 
rounds. On the i-th round, Alice sends a vector Vi with Hwillo^ < 1 to Bob, and Bob has to decide on a sign 
Si G {±1} knowing only his previous choices of signs and {vk}k<i. At the end of the game. Bob pays Alice 
lEiLi '^i'^^'lloo- '^^^^ latter quantity, the value of the game. 

It has been shown in |35| that, in the above limited online variant. Spencer's six standard deviations 
bound [33] does not hold and the best value that we can hope for is f2{-\/nln n). Such a bound is easy to 
obtain by picking the signs {si\ uniformly at random. Indeed, a direct application of Azuma's inequality to 
each coordinate of the random vector X]r=i ^i'^i together with a union bound over all the coordinates gives 
a bound of 0{-\/nhin). 

Now, we generalize the balancing vectors game to the matrix-valued setting. That is, Alice now sends 
to Bob a sequence {Mi\ of self-adjoint matrices of size n witlH < 1, and Bob has to pick a sequence 

of signs {si} so that, at the end of the game, the quantity is as small as possible. Notice that 

the balancing vectors game is a restriction of the balancing matrices game in which Alice is allowed to send 
only diagonal matrices with entries bounded in absolute value by one. Similarly to the balancing vectors 
game, using matrix-valued concentration inequalities, one can prove that Bob has a randomized strategy 
that achieves at most 0{^/n\nn) w.p. at least 1/2. Indeed, 



^ A curious reader may ask him/her-self why the operator norm is the right choice. It turns out the the operator 
norm is the correct matrix-norm analog of the loo vector-norm, viewed as the infinity Schatten norm on the space 
of matrices. 
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Lemma 6. Let Mi G j5"X"^ 11^/^1] < 1, 1 < i < n. Pick s* € {±1} uniformly at random for every i E [n]. 
Then \\Y.7=i = 0{Vnlnn) w.p. at least 1/2. 

Now, let's assume that Bob wants to achieve the above probabiHstic guarantees using a deterministic strat- 
egy. Is it possible? We answer this question in the affirmative by generalizing Spencer's hyperbolic cosine 
algorithm (and its proof) to the matrix-valued setting. We call the resulting algorithm matrix hyperbolic 
cosine (Algorithm [ij . It is clear that this simple greedy algorithm implies a deterministic strategy for Bob 
that achieves the probabilistic guarantees of Lcmma|6](set fj ~ SjMj, t = n and e = 0{y^lnn/n) and notice 
that 7, are at most one). 

Algorithm [T] requires an extra assumption on its random matrices compared to Spencer's original al- 
gorithm. That is, we assume that our random matrices have uniformly bounded their "matrix variance" , 
denoted by p^. This requirement is motivated by the fact that in the applications that are studied in this 
paper such an assumption translates bounds that depend quadratically on the matrix dimensions to bounds 
that depend linearly on the dimensions. 

We will need the following technical lemma for proving the main result of this section, which is a Bernstein 
type argument generalized to the matrix- valued setting |42| . 

Lemma 7. Let f : [m] — > S"^" with |j/(j)|j < 7 for all i S [m] . Let X be a random variable over [m] such that 
Ef{X)=Oand\\Ef{X)^\\<p^. Then,foranye>0,\\E[exp[V{ef{X))]]\\ < exp (p2(e»')' - 1 - 6*7)772) . 

In particular, for any < e < 1, setting 6 ~ e/7 implies that E[ea;p [2? (£/(X)/7)]] p 1'^ ^2n- 
Now we are ready to prove the correctness of the matrix hyperbolic cosine algorithm. 

Algorithm 1 Matrix Hyperbolic Cosine 

1: procedure MATRlx-HYPERBOLlc({/j}, e, t) > fj : [m] — 5> as in Theorem|4l < e < 1. 

2: Set Q = e/7 

3: for i = \ \,o t do 

4: Compute x* e \m\. x* = arg minj^g[,„j tr (^cosh [6'Ej=l lA'^'j) + ^/iC^)] ) 

5: end for 

6: Output: t indices x\,x\, . . . ,xl such that j|j X]j=i /j('''^i)|| — '"/^^"^ 4- 
7: end procedure 



Theorem 4. Let fj : [m] gnxn ^j^-y^ ||/j(i)|| < 7 for all i E [m] and j ~ 1, 2, . . .. Suppose that there exists 



independent random variables Xi,X2, ■ ■ ■ over [m] such thatM fj{Xj) = and fj(Xj) 
with input {/,},£, i outputs a set of indices {x*}jfz[t] over [m] such that j X]j"=i fji^*j) 



< p^ . Algorithm[J\ 

^ 7ln(2n) ep^ 
— te 7 ' 



We conclude with an open question related to Spencer's six standard deviation bound |33]. Does Spencer's 
six standard deviation bound holds under the matrix setting? More formally, given any sequence of n self- 
adjoint matrices {Mi} with \\Mi\\ < 1, does there exist a set of signs {si} so that ■^i^i II = 0{^/n)l 



3 Alon-Roichman Expanding Cayley Graphs 

We start by describing expander graphs. Given a connected undirected d-rcgular graph H ~ (V, E) on n 
vertices, let A be its adjacency matrix, i.e., Aij — Wij where Wij is the number of edges between vertices i 
and j. Moreover, let A = be its normalized adjacency matrix. We allow self-loops and multiple edges. 
Let Xi{A), . . . ,Xn{A) be its eigenvalues in decreasing order. We have that Ai(A) = 1 with corresponding 
eigenvector l/v^, where 1 is the all-one vector. The graph H is called a spectral expander if X{A) := 
max2<j{|Aj (v4)|} < e for some positive constant e < 1. 

Denote by = mk{H) := tr (^*'). By definition, rrife is equal to the number of self-returning walks 
of length k of the graph H . A graph-spcctrum-based invariant, recently proposed by Estrada is defined as 
EE{A) := tr (exp [A]) [TS], which also equals to J^'kLo ''^k/kl. For 6* > 0, we define the even 9-Estrada index 
by ^£;evcn(A^) := Er=o"^2fc(M)/(2fc)!. 
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Now let G be any finite group of order n with identity element id. Let S he a multi-set of elements 
of G, we denote by S' U the symmetric closure of S, namely the number of occurrences of s and 
in 5 U equals the number of occurrences of s e S. Let R be the right regular reprcsentatiorQ, i.e., 
{R{gi)4'){g2) ~ 0(31.92) for every : G M and (71,52 G G. The Cayley graph Cay (G; 5") on a group G 
with respect to the mutli-set 5 C G is the graph whose vertex set is G, and where gi and 52 arc connected 
by an edge if there exists s € S such that g2 = gis (allowing multiple edges for multiple elements in S). In 
this section wc prove the correctness of the following greedy algorithm for constructing expanding Cayley 
graphs. 

Theorem 5. Algorithm\^ given the multiplication table of a finite group G of size n and < £ < 1, outputs 
a (symmetric) multi-set S C G of size 0{\ogn/e^) such that X{Ca.y {G; S)) < e in 0{n'^ \og^ n/e^) time. 
Moreover, the algorithm performs only group algebra operations that correspond to counting closed paths in 
Cayley graphs. 



Algorithm 2 Expander Cayley Graph via even Estrada Index Minimization 

procedure GreedyEstradaMin(G, e) t> Multiplication table of G, < e < 1 

Set S^°^ = and t = 0{logn/e^) 
for i — 1, . . .t do 

Let ,g, € G that (approximately) min. the even e/2-Estrada index of Cay ^G; S''^^' U g U g^^^ over all 
g €z G t> Use Lemma [9] 

5: Set S'-'^ = S'""'' Ug,U g-^ 

end for 

Output: A multi-set 5 := S*'*' of size 2t such that A(Cay (G; S)) < e 
end procedure 

Let A be the normalized adjacency matrix of Cay (G; S U S'~^) for some 5 C G. It is not hard to see that A = 
X^isGS (^(*) + ^(■s~^))- We want to bound A(^). Notice that \{A) = ||(I — J/ri)A||. Since we want to ana- 
lyze the second-largest eigenvalue (in absolute value), we consider (I— J/n)A = Ssgs (-^(*) + ^(■5^^))/2— 
J/n. Based on the above calculation, we define our matrix- valued function as 

f{g):={R{g) + R{g-'))/2-3/n (1) 

for every g Cz G. The following lemma connects the potential function that is used in Theorem |4] and the 
even Estrada index. 

Lemma 8. Let S C G and A be the adjacency matrix of Gay (G; S U S~^^ . For any 9 > 0, tr (^cosh [9 X^ses /( 
EEe,eniA,9/2) + 1 ~ cosh{9\S\). 

The following lemma indicates that it is possible to efficiently compute the (even) Estrada index for Cayley 
graphs with small generating set. 

Lemma 9. Let S d G, 9,6 > 0, and A be the adjacency matrix of Cay {G;S). There is an algorithm that, 
given S, computes an additive S approximation to EE{9A) or EEeveniA, 9) in 0{n\S\ max{log(n/(5), 2e^|5'|0}) 
time. 

Proof, (of Theorem[5]) By Lemma[8l minimizing the even e/2-Estrada index in the i-th iteration is equivalent 
to minimizing tr (cosh [9J2s€S('-^) /(«) + ^fid)]) over aU g e G with 9 = e. Notice that f{g) e 5"^" for 
g e G, Egg^G fig) ~ 0„ since J2geG ^id) ^ ^^^^^ ll/(.9)ll — 2 and moreover a calculation 

implies that HEgg^c /(5)'^|| < 2 as well. Theorem 2] implies that we get a multi-set S of size t such that 

A(Cay (G; 5* U S*"^)) = X^ses /(*) — ^- The moreover part follows from Lemma [5] with (5 = ^ for 
a sufficient large constant c > 0. Indeed, in total we incur (following the proof of Theorem 2]) at most an 
additive ln((5ne^ *)/£ error which is bounded by e. 

* In other words, represent each group algebra element with a permutation matrix of size n that preserves the group 
structure. This is always possible due to Cayley's theorem. 
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4 Fast Isotropic Sparsification and Spectral Sparsification 



Let A be an TO X n matrix with m ^ n whose columns are in isotropic position, i.e., A = I„. For 
< e < 1, consider the problem of finding a small subset of (rescaled) rows of A forming a matrix A such 



that 



A'^A-I 



< s. The matrix Bernstein inequality (see 



tells us that there exists such a set with size 

2 



0(nlogn/£^). Indeed, set f{i) = y4(j) ® — I„ where pi = ||^(i) || / H^Hp- A calculation shows that 7 

and are 0{n). Moreover, Algorithm [T] implies an ©(mn^ log n/e^) time algorithm for finding such a set. 
The running time of Algorithm [T] for rank-one matrix samples can be improved to ©(mn'^polylog (n) /e^) by 
exploiting their rank-one structure. More precisely, using fast algorithms for computing all the eigenvalues 
of matrices after rank-one updates [18) . Next we show that we can further improve the running time by a 
more careful analysis. 

We show how to improve the running time of Algorithm [1] to ©(^^^p- poly log (n, i)) utilizing results from 
numerical linear algebra including the Fast Multipole Method [T^ (FMM) and ideas from [TH]. The main idea 
behind the improvement is that the trace is invariant under any change of basis. At each iteration, we perform 
a change of basis so that the matrix corresponding to the previous choices of the algorithm is diagonal. Now, 
Step 4 of Algorithm [1] corresponds to computing all the eigenvalues of m different eigensystems with special 
structure, i.e., diagonal plus a rank-one matrix. Such eigensystem can be solved in 0(npolylog (n)) time using 
the FMM as was observed in [18]. However, the problem now, is that at each iteration we have to represent 
all the vectors in the new basis, which may cost 0{mn?). The key observation is that the change of basis 
matrix at each iteration is a Cauchy matrix (see Appendix) . It is known that matrix- vector multiplication with 
Cauchy matrices can be performed efficiently and numerically stable using FMM. Therefore, at each iteration, 
we can perform the change of basis in ©(mnpolylog (n)) and to eigenvalue computations in 0(TOnpolylog (n)) 
time. The next theorem states that the resulting algorithm runs in 0(TO7i^polylog (n)) time (see Appendix 
for proof). 

Theorem 6. Let A he an m x n matrix with A^ A = I„, m > n and < e < 1. Algorithm re- 
turns at most t = ©(nlnn/e^) indices x*i,X2-, ■ • ■ a;^ over [to] with corresponding scalars si,S2, ■ • ■ , St using 
O {mn^ log'^ n/ e^) operations such that 



< e. 



(2) 



Algorithm 3 Fast Isotropic Sparsification 



1: procedure Isotrop(A, e) t> A € R"''", Af^k) ® ^(fc) = In and < £ < 1 
2: Set 9 = e/n, t = 0{nlnn/e^), and A^^) ^ ^{fc)/\/Pfe for every k £ [m], where pk = ||A(fc)||^ /n 
3: Set yl{o} = 0„ and Z = VO A 
4: for i = 1 to t do 

5: X* = arg minj.g[„] tr (exp + ^(fc) ® ^(fc)] e"®' + exp [-/!{, _i} - Z^k) ® Z^k)] e"') > Apply m 
times Lemma [T2I 

6: !7{i}] = eigs(yl{i_i} -f .^(i*) ® -^(i*)) i> eigs computes eigensystem 

7: Z = ZU[i} t> Apply fast matrix-vector multiplication 
end for 
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Output: t indices x\,X2, ■ ■ ■ ,xl, x* £ [m] s.t. 
end procedure 



< £ 



Next, we show that Algorithm [3] can be used as a bootstrapping procedure to improve the time complexity 
of [391 Theorem 3.1], see also [7j Theorem 3.1]. Such an improvement implies faster algorithms for constructing 
graph sparsifiers and, as we will see in § [SI element- wise sparsification of matrices. 

Theorem 7. Suppose < e < 1 and A ~ X]i=i ® '^i '^'"^ given, with column vectors Vi e R" and m > n. 
Then there are non-negative weights {si}i<m, o,t most \n/e'^~\ of which are non-zero, such that 

{l-efA^A^{l+efA, (3) 
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where A = X^I^li •SiWi'X'i'i. Moreover, there is an algorithm that computes the weights {si},;<,„ in deterministic 
0{mn^ log'^ n/e^ + logn/e'^) time. 

5 Element-wise Matrix Sparsification 

A deterministic algorithm for the element-wise matrix sparsification problem can be obtained by derandom- 
izing a recent result whose analysis is based on the matrix Bernstein inequality jl4j . 

Theorem 8. Let A be an n x n matrix and < e < I. There is a deterministic polynomial time algorithm 
that, given A and e, outputs a matrix A G R"^" with at most 28nln{^/2n)sr{A) je^ non-zero entries such 



that 



A- A 



<e\\A\\. 



Next, we give two improved element-wise sparsification algorithms for self-adjoint and diagonally dominant- 
like matrices; one of them is randomized and the other is its derandomizcd version. Both algorithms share 
a crucial difference with all previously known algorithms for this problem; during their execution they may 
dcnsify the diagonal entries of the input matrices. On the one hand, there are at most n diagonal entries, 
so this does not affect asymptotically their sparsity guarantees. On the other hand, as we will see later this 
twist turns out to give strong sparsification bounds. 

Recall that the results of |38l7j imply an element-wise sparsification algorithm that works only for Lapla- 
cian matrices. It is easy to verify that Laplacian matrices are also diagonally dominant. Here we extend 
these results to a wider class of matrices (with a weaker notion of approximation) . The diagonally dominant 
assumption is too restrictive and we will show that our sparsification algorithms work for a wider class of 
matrices. To accommodate this, we say that a matrix A is 0-symnictric diagonally dominant (abbreviate by 
6'-SDD) if A is self-adjoint and the inequality \\A\\^ < y/O \\A\\ holds. 

By definition, any diagonally dominant matrix is also a 4-SDD matrix. On the other extreme, every 
self-adjoint matrix of size n is n-SDD since the inequality H^Hq^ < is always valid. The following 

elementary lemma gives a connection between element-wise matrix sparsification and spectral sparsification 
as defined in [39] . 

Lemma 10. Let A be a self-adjoint matrix of size n and R = diag (i?i , i?2 , • ■ • , Rn) where Ri = "^j^i I • 
Then there is a matrix C of size n x m with m < (2) such that 

A = CC'^ + diag (A) - R. (4) 

Moreover, each column of C is indexed by the ordered pairs {i,j), i < j and equals to C^*'-'-' — y^\Aij\ei -\- 
sgn(Ajj) y/\A~\ej for every i < j, i,j e [n]. 

Remark L In the special case where A is the Laplacian matrix of some graph, the above decomposition is 
precisely the vertex-edge decomposition of the Laplacian matrix, since in this case diag (A) ~ R. 

Using the above lemma, we give a randomized and a deterministic algorithm for sparsifying 0-SDD matrices. 
First we present the randomized algorithm. 

Theorem 9. Let A be a 6-SDD matrix of size n and < e < 1. There is a randomized linear time algorithm 
that, given A, \\A\\ and e, outputs a matrix A G M"^" with at most O{n0\ogn/s^) non-zero entries such 

that w.p. at least 1 — 1/n, A — A < e \\A\\ . 

Next we state the derandomizcd algorithm of the above result. 

Theorem 10. Let A be a 9-SDD matrix of size n and < e < 1/2. There is an algorithm that, given A 
and e, outputs a matrix A S M"'^" with at most O{n0/e'^) non-zero entries such that A^A < e\\A\\. 

Moreover, the algorithm computes A in deterministic O {nnz (A) 6 \og^ n/e^ -\- n^9'^ \ogn/e'^) time. 
Remark 2. The results of |7I39| imply a deterministic O {nnz {A) 9n^ /e^) time algorithm that outputs a 
matrix A with at most [19(1 -|- VO)^ / e'^~\n non-zero entries such that A — A < e \\A\\. 
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Appendix 

Fast Multiplication with Cauchy Matrices and Special Eigensystems 

We start by defining the so-called Cauchy (generalized Hilbert) matrices. An m x n matrix C defined by 

— , ie[m],j e[n], 

Li Sj 

where t = {ti, . . . , tm), t G M™ and s = (si, . . . , s„), s € M" and ti ^ Sj for all i € [m] and j G [n] is called 
Cauchy. Given a vector x € M", the naive algorithm for computing the matrix- vector product Cx requires 
0{mn) operations. It is not clear if it is possible to perform this computation in less than 0{mn) operations. 
Surprisingly enough, it is possible to compute this product with 0{{m + n)log'^{m + n)) operations. This 
computation can be done by two different approaches. The first one is based on fast polynomial multiplication, 
polynomial interpolation and polynomial evaluation at distinct points [101 Algorithm 1, p. 130]. The main 
drawback of this approach is its numerical instability. The second approach is based on the so-called Fast 
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Multipole Method (FMM) introduced in [T^]. This method returns an approximate solution to the matrix- 
vector product for any given error parametei0. Ignoring numerical issues that are beyond the scope of this 
work, wc summarize our discussion to the following 

Lemma 11. \10\12f Let x £ R" and C be a Cauchy matrix defined as above with t e M™, s € R". There is 
an algorithm that, given vectors s, t, x, computes the product Cx using 0{{m + n) log^(TO + n)) operations. 

Given a self-adjoint matrix B = S + pu <S) u, where S = diag (ci, . . . , cr„), p > and u e R" is a unit 
vector, our goal is to efficiently compute all the eigenvalues of B. It is well-known that the eigenvalues of 
B are the roots of a special function, known as secular function |17j and are interlaced with {ai}i<n- In 
addition, evaluating the secular function requires 0{n) operations implying that a standard (Newton) root- 
finding procedure requires 0{n) operations per each eigenvalue. Hence, 0{n'^) operations are required for all 
eigenvalues. In their seminal paper jl8j . Gu and Eisenstat showed that it is possible to encode the updates of 
the root-finding procedure for all eigenvalues as matrix- vector multiplication with an n x n Cauchy matrix. 
Based on this observation, they showed how to use the Fast Multipole Method for approximately computing 
all the eigenvalues of this special type of eigenvalue problem. 

Lemma 12. Let e N, p > 0, 17 = diag{<7i,a2, ■ . ■ ,<Jn) md u G R" be a unit vector. There is an 
algorithm that given IJ,p,u computes all the eigenvalues of B — U + pu^u within an additive error 2^^ \\B\\ 
in 0{n \og^ n log b) operations. 



Omitted Proofs 



Proof, (of Lemma [2|) The proof is immediate by the definition of the matrix exponential. Notice that {x ( 

(x ® x)'' 



x)'' = \\x\\^^'' ^' X <^ X k > 1. 



exp [a; ® x] = I + ■ 



fc=i 



kl 



CO II ||2(fe-l) ^ 

\\'x\ ' X® X 



fc=l 



fc! 



-X ^ X. 



Similar considerations give that exp [— x ® x\ —\ 



-1 — X X. 



Proof (of Lemma El) Set B V (A) 



B 



2k+l 



^2k+l Q 



A 




. Notice that for any integer k > 1, B 



2k 



^2fc ■ 





and 



Since the odd powers of B are trace-less, it follows that 

2k Q2k+l \ 



tr(exp [B]) =tr l2„ + ^ 



B 



tr l2n + E 



B 



2k 



(°° j^2k 



= tr(exp[yl] +exp[-^]) = 2tr (cosh [A]) . 



Proof, (of Lemma |4|) By the definition of cosh [•], it suffices to show that exp [PA] = Pexp [A] -f I — P, 
explPAJ^I + J.^^"^)' 



fc=l 



fc! 



OO . ^ 

= I + -rj = Pexp [^] + I - P. 



fc=l 



^ That is, given an n x n Cauchy matrix, a vector x £ R" and < e < 1, returns a vector y £ R" so that 
lly — Cxll^ < £ in time C'(nlog^(l/£)). In an actual implementation, setting e to be a small constant relative 
to the machine's (numerical) precision suffices; see [181 § 3] for a more careful implementation and discussion on 
numerical issues. 
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Proof, (of Lemma in]) We wish to apply matrix Azuma's inequality, see [121 Theorem 7.1]. For every j G [n], 
define the matrix-valued difi^erence sequence fj : [2] — >■ S^^" as fj{k) = {2{k—l) — l)Mj with < 1- Let 

X be a uniform random variable over the set [2]. Then Ex fj{X) = 0„. Set e = ^101n(4n)/n. Matrix- valued 



Azuma's inequality tells us that w.p. at least 1/2, a random set of signs {sj"}jg[n] 
Rescale the last inequality to conclude. 



satisfies 



< e. 



Proof, (of Theorem|4|) Using the notation of Algorithm[Tl for every i = 1,2, ... ,t, define recursively W{i) := 
9 X]}=i fji^'j ) ^^"^ ^^'^ potential function := 2tr (cosh [VK(i)]). For all steps i = 1, 2, . . . , i, we will prove 
that 



<|.W < #-1) exp (£VV7' 

Assume that the algorithm has fixed the first {i — 1) indices x'l, . 
on the expression of the argmin of Step 4 gives that 



■ 1 X 



{t-i) 



(5) 

An averaging argument applied 



Ex, 2tr (cosh [eW{i - 1) + 0f^{X,)]) = Ex, tr (exp [OV {W{i - 1)) + 0V {f^{X,))]) 

< tr (exp [V {9W{i - 1))] Ex. exp [V (ef^iX,))]) 

< tr (exp [V {eW{i - 1))] hn) exp (e^p'/j^) 

= <l>(^-i)exp(£V/7') 

where in the first inequality we used Lemma |3] and linearity of dilation, in the second inequality we used 
the Golden-Thompson inequality (Lemma[T]) and linearity of trace, in the third inequality we used Lemma[5] 
together with Lemma [7] and in the last equality we used again Lemma O Since the algorithm seeks the 
minimum of the expression in Step 4, it follows that < Ex. 2tr (exp [ev {W{i - 1)) + OV {f^{X,))]) 

which proves Ineq. ([5]). Apply t times Ineq. © to conclude that < exp • Recall that = 

2tr (cosh [0„]) = 2tr (I„) = 2n. On the other hand, we can lower bound 



= 2tr cosh 



> exp 



The last inequality follows since 2tr (cosh [C]) = 2 X]"=i cosh(Ai(C)) > 2 cosh (Amax(C))+2 cosh (Ainin(C)) > 
exp(||C||) for any matrix C G 5"^^" . Take logarithms on both sides and divide by 9, we conclude that 

E*-=i/iH) 



< 



ln(2n 



t^e^- Rescale by t the last inequality to conclude the proof. 



Proof, (of Lemma |8|) For notational convenience, set P := !„ — Jn/n and B := |X]ses(^('*) + ^i^) 
Since JR{g) = R{g)-i = J, we have that tr (cosh X^ses ) ^ tr (cosh [P_B]). Now using Lemma |4l it 
follows tr (cosh [PB]) = tr (Pcosh [B]+I- P) ^ tr (cosh [B]) + tr (-^cosh [B]+I- P) Notice that 3/n 
is a projector matrix, hence applying Lemmas I2I4I we get that 



tr 



cosh [B]+l- P] = tr (-cosh [J/nB] + P + I-P) = 1- cosh(6'|S'|). 



Proof, (of Lemma [S]) We will prove the Lemma for EE{A,9), the other case is similar. Let h := ()J2ses^ 
be a group algebra element of G, i.e, h G R[G]. Define exp [/i] :— id -I- X^feli TT ^'^'^ '^iW ■= id -|- 
Ylk=i 'TT (where /i*'^ is the fc-folded convolution/multiplication over IR[G']) the exponential operator and its I 
truncated Taylor series, respectively. Notice that 9A ~ 9 X^ses ^(*) ^ so EE{A, 9) ~ tr (exp [R{h)]) ~ 

tr (i?(exp [/i])). We will show that the quantity tr {R{Ti{h))) is a (5 approximation for EE{A,9) when I > 
max{\og{n/S),2c^\S\9}. 

Compute the sum of T; (h) by summing each term one by one and keeping track of all the coefficients of the 
group algebra elements. The main observation is that at each step there are at most n such coefficients since we 



12 



are working over R[G]. For fc > 1, compute the k-th term of the sum by (X^sss CsS^ /k\ = (X^sss Css)'^~^/(fc— 
1)! • Xses(cs/^)s. Assume that we have computed the first term of the above product, which is some group 
algebra clement denote it by Xggc Pad ^'^'^ some j3g S R. Hence, at the next iteration, we have to compute 
the product/convolution of X^gg 1^99 '^ith ^/^Xsss ^- "^hich can be done in ©(nlS*!) time. Since the sum 
has I terms, in total we require C(n|S'|Z) operations. Now, wc show that it is a (5 approximation. We need 
the following fact (see [H Theorem 10.1, p. 234]) 

Fact 11 For any B £ R"^", let Ti{B) XLo fr- \\exp[B] - Ti{B)\\ < ^(^j^dl^H. 

Notice that \\9A\\ = ||Xses ^^('*)|| — ^l-^*! by triangle inequality and the fact that ||i?(g)|| = 1 for any g e G. 
Applying Fact [TT] on 9 A we get that 

||exp[^A]-T,(M)||<M)!_ie«l^l < (^)'^'e^l^l 



i+(e|s|)/(;+i)^|5|y+i I J 



l + l J - 2'+i n 

where wc used the inequality {I + 1)! > {^^Y'^^ and the assumption that I > max{log(n/(5), 2e^0|S'|}. 

Lemma 13. Assume that the first (i — 1) indices, i < t have been fixed by Algorithmic Let '^j,*'' be the value 
of the potential function when the index k has been selected at the next iteration of the algorithm. Similarly, 
let be the ( approximate ) value of the potential function computed using Lemma within an additive 
error S > for all eigenvalues. Then, 

Proof. Let ti,t2, ■ ■ ■ ,Tn be the eigenvalues of ^{i-i} + ^(fc) ® Z(^i.y Let ti,t2, ■ • ■ ,tVi be the approximate 
eigenvalues of the latter matrix when computed via Lemma [T2] within an additive error J > 0, i.e, |rj — Tj \ < S 
for all j e [n] . 

First notice that, by Step 5 of Algorithm^ ^>|,'^ = 2 X^Li cosh(r, -Xi). Similarly, $[!^ := 2 cosh(?j - 

Xi). By the definition of the hyperbolic cosine, we get that 

n n 

cosh(fj -Xi)=Y^ cosh(rj - Xi + - Tj) 
1 " 

= 2 X! [*5^P(^J ~ -^*) G^v{rj - + exp(-Tj + Xi) exp(-Tj + t^)] . 

To derive the upper bound notice that Xj=i cosh(-rj— Ai) < Xj=i coshirj—Xi) maXjg[„]{exp(Tj— Tj), exp(— Tj + 
Tj)} and the maximum is upper bounded by exp((5). Similarly, for the lower bound. 

Proof, (of Theorem [6l) The proof consists of three steps: (a) wc show that Algorithm |3] is a reformulation of 
Algorithm [U (6) we prove that in Step 5 of Algorithm |3] it is enough to compute the values of the potential 
function within a sufficiently small multiplicative error using Lemma 1121 and (c) we give the advertised 
bound on the running time of Algorithm |31 

II 1 1 2 2 

Set Pi ~ ||^(j)|| / II^IIf' /(*) ~ ^{i) ® -^(i)/Pi ~ I" Si = \/pi for every i £ [to]. Observe that 
II A||p = tr (^^A) — tr (I„) ~ n. Let X be a random variable distributed over [to] with probability pi. Notice 

II 1 1 2 

that E /(A) = 0„ and 7 = n, since ||/(i)|| = nA^jj (g) A^jJ —In < n for every i S [to]. Moreover, 

a direct calculation shows that E/(A)^ = E {A(^x) ® ^(X)/Px)'^ — In = "Xilli ^(j) ® ^(i) ^ I" = ('^ ~ l)In, 
hence < n. Algorithm [T] with t = 0(nlnn/e^) returns indices x|,X2, . . . ,Xj so that j Xj=i fji^*j) — 
7in(2n) j^^^2 <- ncxt provc by induction that the same set of indices are also returned by Algorithm[3] 
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For ease of presentation, rescale every row of the input matrix A, i.e., set = ^(fc) \/ ^/Pk for every 
k S [to] (see Steps 2 and 3 of Algorithm [S]). For sake of the analysis, let us define the following sequence of 
self-adjoint matrices of size n 

T{o} 0„, 

T{i} := T{,_i} + (g) for i e [t] 

with eigenvalue decompositions T^ij = Q{i}^{i}Qji^, where /Ij-jj are diagonal matrices containing the 
eigenvalues and the columns of contain the corresponding eigenvectors. Set Q^oy = I and j^{o} = 
0. Notice that for every k g [m], by the eigenvalue decomposition of Ti^^_iy, T{j_]^} + A^^,) (g) A(^j^-^ = 
Q{i-i} (^{i-i} + ® Q{i-i}^(fe)) Observe that the above matrix (left hand side) and 

A{i-i} + Q{i-i}^(fe) ® Qji-i}^{k) have the same eigenvalues, since they are similar matrices. Let A^i^i^ + 
Qji-i}^{x*) ® Q{i_i}A(rE*) = ^{il^fil^fi} be its eigenvalue decompositioiH. Then 

It follows that Q{i} = Q{i-i}C^{j} for every z > 1, so Q{i} = C^{i}C^{2} ■ • ■ ^^{i}- The base case of the induction 
is immediate. Now assume that Algorithm [3] has returned the same indices as Algorithm [T] up to the (z — l)-th 
iteration. It suffices to prove that at the i-th iteration Algorithm [3] will return the index x*. 

We start with the expression in Step 4 of Algorithm [T] and prove that it's equivalent (up to a fixed 
multiplicative constant factor) with the expression in Step 5 of Algorithm O Indeed, for any k e [m], (let 

2tr (cosh [C + 0fik)]) = tr (exp [C + ef{k)] + exp [-C - ef{k)]) 
= tr (exp T{i_i} + (g l(fc) e-^' + exp -Ti^,-i} - A^k) ® ^(fc) e^*) 

where we used the definition of cosh [•], fii) and T^i_i^ and the fact that the matrices commute. In light of 
Algorithm [3] and the induction hypothesis, observe that the mx n matrix Z at the start of the t-th iteration 
of Algorithm [3] is equal to AU[iyU[2} ■ ■ ■ ^^{i-i} — ^Q{i~i}- Now, multiply the latter expression that appears 
inside the trace with Qji_iy from the left and Q{i-i} from the right, it follows that ((let C := 6'X]j=i fi^j))) 

2tr (cosh [C + 0f{k)]) = tr (exp [A{,_i} + ® Z^k)] c~'' + exp [-^{,-1} - Z^k) (g ^(fc)] e"') 

using that Q{i^i} are the eigenvectors of T^i^iy and the cyclic property of trace. This concludes part (a). 

Next we discuss how to deal with the technicality that arises from the approximate computation of the 
arg min expression in Step 5 of Algorithm [31 First, let's assume that we have approximately (by invoking 
Lemma I12|) minimized the potential function in Step 5 of Algorithm [3l denote this sequence of potential 
function values by 'P^^\ . . . , <P^*\ Next, we sufficiently bound the parameter b of Lemma[T2lso that the above 
approximation will not incur a significant multiplicative error. 

Recall that at every iteration, by Ineq. ([5]) there exists an index over [m] such that the current value of 
the potential function increases by at most a multiplicative factor exp (e^p^/7^). Lemma [T3l tells us that at 
every iteration of Algorithm [3] we increase the value of the potential function (by not selecting the optimal 
index over [to]) by at most an extra multiplicative factor c^^ , where 6 is the additive error when computing 
the eigenvalues of the matrix in Step 5 via Lemma [T2] Therefore, it follows that < exp{2St)<P^*K 

Observe that, at the i-th iteration we apply Lemm jT2] on a matrix ^{xj) ® ^{xj) for some indices 

Xj £ [to] and moreover J2j=i ^{xj) ® ^{xj) 



6 



by its definition, T^iy has the same eigenvalues with yl{,_i} + (3|^_j^j ® Q|^_ jj . 



14 



Triangle inequality tells us that J2]=i ^i^j) ® ^{xj) 

is at most 2'y9t for every i G [t]. It follows that S is at most 2~^'^^0tj where b is specified in Lemma [T2l The 
above discussion suggests that by setting b = 0{\og{6jt)) = C'(log(nlogn/e'^)) we can guarantee that the 
potential function < 2nexp (3ie^). This concludes part (6). 

Finally, we conclude the proof by analyzing the running time of Algorithm [3) Steps 2 and 3 can be done 
in 0{mn) time. Step 5 requires ©(mnlog^n) operations by invoking m times Lemma 1121 Steps 6 can be 
done in 0{n^) time and Step 7 requires 0{mnlog^ ii) operations by invoking Lemma [TTI In total, since the 
number of iterations is 0{nlogn/e^), the algorithm requires O {mn^ log^ n / e'^) operations. 

Proof, (of Theorem [7]) Assume without loss of generality that A has full rank. Define Ui = A^^/^Vi and 
notice that X^I^li Ui'^^Ui = I„. Run Algorithm [3] with input {wijigim] and e which returns {Ti},;<,„, at most 
t = C(nlogn/e^) of which are non-zero such that 



1 Ui - In 



< e. 



(6) 



Define A = A^^"^ (Sl^i '^i^i ® ""0 ^^^"^ ^ X]"=i TiVi®Vi- Eqn. ^ is equivalent to (1 — e)I„ ^ Y^T=i '''iUi®Ui ^ 
(1 + e)ln. Conjugating the latter expression by A^/^, see [20l § 7.7], we get that (1 - e)A < A ^ {I + e)A. 
Apply |391 Theorem 3.1] on A which outputs a matrix A = X^I^i SiVi®Vi with non-negative weights {si}ig[m] 
at most [ri/e^] of which are non-zero, such that (1 — e^A < A< [1 + eYA. Using the positive semi-definite 
partial ordering, we conclude that (1 - efA ^ A ^ (1 + efA. 

Proof, (of Theorem [8]) By homogeneity, assume that ||j4|| = 1. Following the proof of [M], we can assume 
that w.l.o.g. all non-zero entries of A have magnitude at least e/{2n) in absolute value, otherwise we can 
zero-out these entries and incur at most an error of e/2 (see [111 § 4.1]). 

Consider the bijection tt between the sets [n^] and [n] x [n] defined by 7r(Z) i—> ([Z/n] , (/ — 1) mod n + 1) 



for every I G [n?]- Let Eij £ 
V (^K(0 - A) where pi 



be the all zeros matrix having one only in the (i,j) entry. Set h{l) 



Al(i)l\\M\¥ for every I £ [n?]. Observe that h{-) € 



Let X be a 



random variable over [n^] with distribution pi, I G [n^]. The same analysis as in Lemmas 2 and 3 of |14| 
together with properties of the dilation map imply that ||/i(OII < 4nsr (A) /efor every Z G [n'^],Kh{X) = 02n, 
and ||E/i(X)2|| < nsr{A). 

Run Algorithm [1] with h{-) as above. Algorithm [T] returns at most t = 28?! ln(\/2n)sr (A) /e^ indices 
xl,X2, ■ ■ - xl over [n^] using 0{n^sr {A) \ogn/e^) operations such that 



1 * 

1=1 



h{x*i) 



< e/2. 



(7) 



Set A:= J X^Li "^^{xt)/ 



/pj.'E.^/^ty Observe that A has at most t non-zero entries. Now, by the definition of 



h(-) and properties of the dilation map, it follows that Incq. ([7]) is equivalent to 
e/2. 



A- A 



< 



Proof (of Lemma [ini) The key identity is CC^ J2 
follows that 



l,keln], Kk 



(j{l,k) ^ (jiLk)^ Lj,^ 1^1^ q ^j^jj / < fc, it 



= \Aik\ei (g)ei+ AikCk Ck + Aitet ® ei + \ Aik\ek ® e^. 



Therefore 



CC^ = ^ [\Aik\ei (}() ei + AikCk ® ek + Aiueu ' 

l,k£[n]: Kk 



ei + \Aik\ek Cfc] 



(8) 
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Let's first prove the equality for tfie off-diagonal entries of Eqn Let / < k and l,k G [n]. By construction, 
the only term of the sum that contributes to the and (j, i) entry of the right hand side of Eqn. (jH]) is 
the term C^^'^^ ® (^(^i) . Moreover, this term equals \Aij\ei (81 + AijCi ® ej + AijCj (81 e, + \ Aij\ej (8 Cj . Since 
Aij — Aji this proves that the off-diagonal entries are equal. 

For the diagonal entries of Eqn. (jj]), it suffices to prove that {CC^ ju = Ri. First observe that the last 
two terms of the sum in the right hand side of ([S]) do not contribute to any diagonal entry. Second, the first 
two terms contribute only when I — i or k ~ i. In the case where I ^ i, the contribution of the sum equals 



to X]i<fc l^ifel- On the other case (fc = i), the contribution of the sum is equal to ^j^j |A; 



However, A is 



self-adjoint so An = An for every ^ < i. It follows that the total contribution is J2 <k \^ik\ 

Proof. ( of Theorem [HI) In one pass over the input matrix A normalize the entries of A by ||^||, so assume 
without loss of generality that = 1. Let C be the n x m matrix guaranteed by Lemma 1101 where 
m = (2), each column of C is indexed by the ordered pairs i < j and A = CC^ + diag {A) — R. By 

definition of C and the hypothesis, wc have that ||C*C^|| = \\A - diag (A) + R\\ < \\A\\ + \\A\\^ < 2V0 and 
l|C|lF = 2E,,,l^.,|<2nP||^<2nv^. 



Consider the bijection between the sets [m] and {(i, j) \ i < j, i,j € [n]} defined by ■k{1) i-^ ([Z/n], (i — 1) 

mod n -t- 1). For each I e [m], set pi = ||C"^('^ f / I|C|If and define f{l) := C^C) ® C^('VPi " CC^ . Let 
X be a real-valued random variable over [to] with distribution pi. It is easy to verify that 'Ef(X) = 0„, 
l!/(OI! < 2||C||p for every I e [m]. A direct calculation gives that ||E/(X)2|| < 2 ||C||p ||CC^||. Matrix 
Bernstein inequality (see [42]) with /(•) as above (7 = AnVd and = 8n6) tells us that if we sample 
t = 38n01n(-\/2n)/£^ indices xl,xl, . . . over [m] then with probability at least 1 — 1/n, || 7 /(-^p 

e. Now, set C £ 



< 



^nxt j column of C^^^ equals -^C'^'-^i^ 



It follows that 7^5=1/^*) = 



CC^ - cc^ 



Define A = CC^ + diag (A) - R. First notice that 



A~A 



CC^ - CC^ 



< e. It suffices to bound the number of non-zeros of A. To do so, view the 



matrix-product CC^ as a sum of rank-one outer-products over all columns of C . By the special structure 
of the entries of C, every outer-product term of the sum contributes to at most four non-zero entries, two 
of which are off-diagonal. Since C has at most t columns, A has at most n 2t non-zero entries; n for the 
diagonal entries and 2t for the off-diagonal. 

Proof, (of Theorem [T0|) Let C be the n x m matrix such that A — CC^ + diag {A) — R and to < nnz (A) 
guaranteed by Lemma [Tni Apply Theorem [7] on the matrix CC^ and e which outputs, in deterministic 
0{nnz(A)n^e\og^n/e^+n^e^\ogn/e^) time, annx [n/e^] matrix C such that {l-efCC'^ ^ CC^ ^ (1 + 

e)^CC^ . By Weyl's inequality ^ Theorem 4.3.1] and the fact that e < 1/2, it follows that CC^ - CC^ < 

5e ||CC^||. Define A := CC^ + diag (A) — R. First we argue that the number of non-zero entries of A is at 
most n -I- [2n/e^] . Recall that every column of C is a rescaled column of C. Now, think the matrix-product 
CC^ as a sum of rank-one outer-products over all columns of C. By the special structure of the entries 
of C, every outer-product term of the sum contributes to at most four non-zero entries, two of which are 
off-diagonal. Since C has at most \n/e'^~\ columns, A has at most n + \2n/e'^~\ non-zero entries; n for the 
diagonal entries and \2n/e'^] for the off-diagonal. Moreover, A is close to A in the operator norm sense. 
Indeed, 



A^A 



CC^ - CC^ 
<5e{\\A\\ + \\Al 



< 5e||CC^|| = Sep -diag (A) 

) < lOeV^llAII 



R\\ 



where we used the definition of A, Eqn. (U), triangle inequality, the assumption that A is 6'-SDD and the 
fact that > I. Repeating the proof with e' = and elementary manipulations conclude the proof. 
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